Overview
Brought to you by YData
Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 427 | 422 |
| Missing cells (%) | 8.0% | 7.9% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 4 | 4 |
| Text | 3 | 3 |
| Dataset A | Dataset B | |
|---|---|---|
Age has 88 (19.7%) missing values | Age has 83 (18.6%) missing values | Missing |
Cabin has 338 (75.8%) missing values | Cabin has 338 (75.8%) missing values | Missing |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 311 (69.7%) zeros | SibSp has 308 (69.1%) zeros | Zeros |
Parch has 342 (76.7%) zeros | Parch has 345 (77.4%) zeros | Zeros |
Fare has 11 (2.5%) zeros | Fare has 9 (2.0%) zeros | Zeros |
| Alert not present in this dataset | Sex is highly overall correlated with Survived | High correlation |
| Alert not present in this dataset | Survived is highly overall correlated with Sex | High correlation |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2025-03-11 15:19:02.612077 | 2025-03-11 15:19:05.085419 |
| Analysis finished | 2025-03-11 15:19:05.082227 | 2025-03-11 15:19:07.626429 |
| Duration | 2.47 seconds | 2.54 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
Variables
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 433 | 446.37444 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 3 |
| Maximum | 884 | 891 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 3 |
| 5-th percentile | 52.5 | 41.25 |
| Q1 | 216.25 | 232.75 |
| median | 427 | 439.5 |
| Q3 | 647.25 | 671.75 |
| 95-th percentile | 832.75 | 851.5 |
| Maximum | 884 | 891 |
| Range | 883 | 888 |
| Interquartile range (IQR) | 431 | 439 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 252.47345 | 257.66194 |
| Coefficient of variation (CV) | 0.58307957 | 0.57723275 |
| Kurtosis | -1.1647224 | -1.194053 |
| Mean | 433 | 446.37444 |
| Median Absolute Deviation (MAD) | 214.5 | 221 |
| Skewness | 0.060243897 | 0.049195146 |
| Sum | 193118 | 199083 |
| Variance | 63742.845 | 66389.677 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 401 | 1 | 0.2% |
| 561 | 1 | 0.2% |
| 770 | 1 | 0.2% |
| 373 | 1 | 0.2% |
| 307 | 1 | 0.2% |
| 76 | 1 | 0.2% |
| 4 | 1 | 0.2% |
| 791 | 1 | 0.2% |
| 211 | 1 | 0.2% |
| 845 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 56 | 1 | 0.2% |
| 37 | 1 | 0.2% |
| 767 | 1 | 0.2% |
| 525 | 1 | 0.2% |
| 684 | 1 | 0.2% |
| 875 | 1 | 0.2% |
| 723 | 1 | 0.2% |
| 470 | 1 | 0.2% |
| 189 | 1 | 0.2% |
| 444 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 17 | 1 | |
| 18 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 15 | 1 | |
| 17 | 1 |
| Value | Count | Frequency (%) |
| 3 | 1 | |
| 4 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 13 | 1 | |
| 15 | 1 | |
| 17 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 11 | 1 | |
| 12 | 1 | |
| 17 | 1 | |
| 18 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 1 |
| 2nd row | 0 | 0 |
| 3rd row | 0 | 0 |
| 4th row | 1 | 0 |
| 5th row | 0 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
| Value | Count | Frequency (%) |
| 0 | 278 | |
| 1 | 168 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 3 |
| 2nd row | 3 | 1 |
| 3rd row | 3 | 3 |
| 4th row | 1 | 3 |
| 5th row | 3 | 2 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 108 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 114 | |
| 2 | 86 | 19.3% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 108 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 114 | |
| 2 | 86 | 19.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 108 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 114 | |
| 2 | 86 | 19.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 108 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 114 | |
| 2 | 86 | 19.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 108 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 114 | |
| 2 | 86 | 19.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 446 |
| Value | Count | Frequency (%) |
| (unknown) | 446 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 237 | |
| 1 | 108 | |
| 2 | 101 |
| Value | Count | Frequency (%) |
| 3 | 246 | |
| 1 | 114 | |
| 2 | 86 | 19.3% |
Name
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 61 |
| Median length | 50 | 46 |
| Mean length | 26.742152 | 26.596413 |
| Min length | 13 | 12 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Morrow, Mr. Thomas Rowan | Mamee, Mr. Hanna |
| 2nd row | Gronnestad, Mr. Daniel Danielsen | Brewe, Dr. Arthur Jackson |
| 3rd row | Beavan, Mr. William Thomas | Kassem, Mr. Fared |
| 4th row | Fleming, Miss. Margaret | Goodwin, Mr. Charles Edward |
| 5th row | Moen, Mr. Sigurd Hansen | Abelson, Mrs. Samuel (Hannah Wizosky) |
| Value | Count | Frequency (%) |
| mr | 268 | 14.8% |
| miss | 96 | 5.3% |
| mrs | 62 | 3.4% |
| william | 34 | 1.9% |
| john | 18 | 1.0% |
| henry | 17 | 0.9% |
| james | 16 | 0.9% |
| master | 16 | 0.9% |
| george | 14 | 0.8% |
| charles | 12 | 0.7% |
| Other values (872) | 1255 |
| Value | Count | Frequency (%) |
| mr | 269 | 15.0% |
| miss | 95 | 5.3% |
| mrs | 54 | 3.0% |
| william | 36 | 2.0% |
| henry | 18 | 1.0% |
| john | 17 | 0.9% |
| master | 17 | 0.9% |
| george | 16 | 0.9% |
| james | 13 | 0.7% |
| charles | 13 | 0.7% |
| Other values (899) | 1244 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1363 | 11.4% | |
| r | 971 | 8.1% |
| e | 877 | 7.4% |
| a | 823 | 6.9% |
| i | 673 | 5.6% |
| s | 630 | 5.3% |
| n | 611 | 5.1% |
| M | 565 | 4.7% |
| l | 543 | 4.6% |
| o | 462 | 3.9% |
| Other values (50) | 4409 |
| Value | Count | Frequency (%) |
| 1346 | 11.3% | |
| r | 949 | 8.0% |
| e | 851 | 7.2% |
| a | 788 | 6.6% |
| i | 678 | 5.7% |
| s | 659 | 5.6% |
| n | 632 | 5.3% |
| M | 541 | 4.6% |
| l | 532 | 4.5% |
| o | 508 | 4.3% |
| Other values (50) | 4378 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 11927 |
| Value | Count | Frequency (%) |
| (unknown) | 11862 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 1363 | 11.4% | |
| r | 971 | 8.1% |
| e | 877 | 7.4% |
| a | 823 | 6.9% |
| i | 673 | 5.6% |
| s | 630 | 5.3% |
| n | 611 | 5.1% |
| M | 565 | 4.7% |
| l | 543 | 4.6% |
| o | 462 | 3.9% |
| Other values (50) | 4409 |
| Value | Count | Frequency (%) |
| 1346 | 11.3% | |
| r | 949 | 8.0% |
| e | 851 | 7.2% |
| a | 788 | 6.6% |
| i | 678 | 5.7% |
| s | 659 | 5.6% |
| n | 632 | 5.3% |
| M | 541 | 4.6% |
| l | 532 | 4.5% |
| o | 508 | 4.3% |
| Other values (50) | 4378 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 11927 |
| Value | Count | Frequency (%) |
| (unknown) | 11862 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 1363 | 11.4% | |
| r | 971 | 8.1% |
| e | 877 | 7.4% |
| a | 823 | 6.9% |
| i | 673 | 5.6% |
| s | 630 | 5.3% |
| n | 611 | 5.1% |
| M | 565 | 4.7% |
| l | 543 | 4.6% |
| o | 462 | 3.9% |
| Other values (50) | 4409 |
| Value | Count | Frequency (%) |
| 1346 | 11.3% | |
| r | 949 | 8.0% |
| e | 851 | 7.2% |
| a | 788 | 6.6% |
| i | 678 | 5.7% |
| s | 659 | 5.6% |
| n | 632 | 5.3% |
| M | 541 | 4.6% |
| l | 532 | 4.5% |
| o | 508 | 4.3% |
| Other values (50) | 4378 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 11927 |
| Value | Count | Frequency (%) |
| (unknown) | 11862 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 1363 | 11.4% | |
| r | 971 | 8.1% |
| e | 877 | 7.4% |
| a | 823 | 6.9% |
| i | 673 | 5.6% |
| s | 630 | 5.3% |
| n | 611 | 5.1% |
| M | 565 | 4.7% |
| l | 543 | 4.6% |
| o | 462 | 3.9% |
| Other values (50) | 4409 |
| Value | Count | Frequency (%) |
| 1346 | 11.3% | |
| r | 949 | 8.0% |
| e | 851 | 7.2% |
| a | 788 | 6.6% |
| i | 678 | 5.7% |
| s | 659 | 5.6% |
| n | 632 | 5.3% |
| M | 541 | 4.6% |
| l | 532 | 4.5% |
| o | 508 | 4.3% |
| Other values (50) | 4378 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.7085202 | 4.6816143 |
| Min length | 4 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | male |
| 2nd row | male | male |
| 3rd row | male | male |
| 4th row | female | male |
| 5th row | male | female |
Common Values
| Value | Count | Frequency (%) |
| male | 288 | |
| female | 158 |
| Value | Count | Frequency (%) |
| male | 294 | |
| female | 152 |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 288 | |
| female | 158 |
| Value | Count | Frequency (%) |
| male | 294 | |
| female | 152 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2100 |
| Value | Count | Frequency (%) |
| (unknown) | 2088 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2100 |
| Value | Count | Frequency (%) |
| (unknown) | 2088 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2100 |
| Value | Count | Frequency (%) |
| (unknown) | 2088 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 604 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 158 | 7.5% |
| Value | Count | Frequency (%) |
| e | 598 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 152 | 7.3% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 74 | 75 |
| Distinct (%) | 20.7% | 20.7% |
| Missing | 88 | 83 |
| Missing (%) | 19.7% | 18.6% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.750698 | 29.955923 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.67 |
| Maximum | 80 | 80 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.67 |
| 5-th percentile | 4.85 | 4 |
| Q1 | 21 | 21 |
| median | 29 | 28 |
| Q3 | 38 | 38 |
| 95-th percentile | 54 | 57.8 |
| Maximum | 80 | 80 |
| Range | 79.58 | 79.33 |
| Interquartile range (IQR) | 17 | 17 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.074854 | 14.624023 |
| Coefficient of variation (CV) | 0.47309323 | 0.48818468 |
| Kurtosis | 0.11118063 | 0.42182478 |
| Mean | 29.750698 | 29.955923 |
| Median Absolute Deviation (MAD) | 8.25 | 8 |
| Skewness | 0.27959062 | 0.46640687 |
| Sum | 10650.75 | 10874 |
| Variance | 198.10151 | 213.86204 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 24 | 16 | 3.6% |
| 22 | 15 | 3.4% |
| 36 | 14 | 3.1% |
| 18 | 14 | 3.1% |
| 30 | 13 | 2.9% |
| 35 | 13 | 2.9% |
| 21 | 12 | 2.7% |
| 31 | 12 | 2.7% |
| 28 | 11 | 2.5% |
| 34 | 10 | 2.2% |
| Other values (64) | 228 | |
| (Missing) | 88 | 19.7% |
| Value | Count | Frequency (%) |
| 24 | 20 | 4.5% |
| 18 | 15 | 3.4% |
| 28 | 13 | 2.9% |
| 36 | 13 | 2.9% |
| 30 | 12 | 2.7% |
| 25 | 12 | 2.7% |
| 26 | 11 | 2.5% |
| 20 | 11 | 2.5% |
| 21 | 11 | 2.5% |
| 40 | 10 | 2.2% |
| Other values (65) | 235 | |
| (Missing) | 83 | 18.6% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 4 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 6 | 3 | |
| 7 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 7 | |
| 3 | 2 | 0.4% |
| 4 | 6 | |
| 6 | 3 | |
| 7 | 1 | 0.2% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 7 | |
| 3 | 2 | 0.4% |
| 4 | 6 | |
| 6 | 3 | |
| 7 | 1 | 0.2% |
| 8 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 1 | 2 | 0.4% |
| 2 | 4 | |
| 3 | 3 | |
| 4 | 5 | |
| 5 | 2 | 0.4% |
| 6 | 3 | |
| 7 | 2 | 0.4% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.51345291 | 0.53363229 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 311 | 308 |
| Zeros (%) | 69.7% | 69.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 2 | 2.75 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.1469738 | 1.1775002 |
| Coefficient of variation (CV) | 2.2338442 | 2.206576 |
| Kurtosis | 20.43215 | 18.648508 |
| Mean | 0.51345291 | 0.53363229 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 4.0105825 | 3.8637579 |
| Sum | 229 | 238 |
| Variance | 1.3155489 | 1.3865068 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 311 | |
| 1 | 97 | 21.7% |
| 2 | 17 | 3.8% |
| 3 | 8 | 1.8% |
| 4 | 6 | 1.3% |
| 8 | 5 | 1.1% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 99 | 22.2% |
| 2 | 16 | 3.6% |
| 3 | 9 | 2.0% |
| 8 | 5 | 1.1% |
| 4 | 5 | 1.1% |
| 5 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 311 | |
| 1 | 97 | 21.7% |
| 2 | 17 | 3.8% |
| 3 | 8 | 1.8% |
| 4 | 6 | 1.3% |
| 5 | 2 | 0.4% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 99 | 22.2% |
| 2 | 16 | 3.6% |
| 3 | 9 | 2.0% |
| 4 | 5 | 1.1% |
| 5 | 4 | 0.9% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 308 | |
| 1 | 99 | 22.2% |
| 2 | 16 | 3.6% |
| 3 | 9 | 2.0% |
| 4 | 5 | 1.1% |
| 5 | 4 | 0.9% |
| 8 | 5 | 1.1% |
| Value | Count | Frequency (%) |
| 0 | 311 | |
| 1 | 97 | 21.7% |
| 2 | 17 | 3.8% |
| 3 | 8 | 1.8% |
| 4 | 6 | 1.3% |
| 5 | 2 | 0.4% |
| 8 | 5 | 1.1% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 6 |
| Distinct (%) | 1.6% | 1.3% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.37443946 | 0.35201794 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 6 | 5 |
| Zeros | 342 | 345 |
| Zeros (%) | 76.7% | 77.4% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 0 |
| 95-th percentile | 2 | 2 |
| Maximum | 6 | 5 |
| Range | 6 | 5 |
| Interquartile range (IQR) | 0 | 0 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.80513606 | 0.74612585 |
| Coefficient of variation (CV) | 2.1502436 | 2.1195677 |
| Kurtosis | 10.897371 | 8.1967426 |
| Mean | 0.37443946 | 0.35201794 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.8525129 | 2.5537868 |
| Sum | 167 | 157 |
| Variance | 0.64824407 | 0.55670378 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=7)
| Value | Count | Frequency (%) |
| 0 | 342 | |
| 1 | 57 | 12.8% |
| 2 | 39 | 8.7% |
| 3 | 4 | 0.9% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| 4 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 55 | 12.3% |
| 2 | 41 | 9.2% |
| 5 | 2 | 0.4% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 342 | |
| 1 | 57 | 12.8% |
| 2 | 39 | 8.7% |
| 3 | 4 | 0.9% |
| 4 | 1 | 0.2% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 55 | 12.3% |
| 2 | 41 | 9.2% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 345 | |
| 1 | 55 | 12.3% |
| 2 | 41 | 9.2% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 5 | 2 | 0.4% |
| Value | Count | Frequency (%) |
| 0 | 342 | |
| 1 | 57 | 12.8% |
| 2 | 39 | 8.7% |
| 3 | 4 | 0.9% |
| 4 | 1 | 0.2% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
Ticket
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 382 | 389 |
| Distinct (%) | 85.7% | 87.2% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 7.0538117 | 6.6793722 |
| Min length | 3 | 4 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 331 | 344 ? |
| Unique (%) | 74.2% | 77.1% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 372622 | 2677 |
| 2nd row | 8471 | 112379 |
| 3rd row | 323951 | 2700 |
| 4th row | 17421 | CA 2144 |
| 5th row | 348123 | P/PP 3381 |
| Value | Count | Frequency (%) |
| pc | 30 | 5.1% |
| c.a | 16 | 2.7% |
| a/5 | 11 | 1.9% |
| ca | 8 | 1.4% |
| soton/o.q | 8 | 1.4% |
| 2 | 7 | 1.2% |
| ston/o | 7 | 1.2% |
| 2343 | 5 | 0.9% |
| soton/oq | 5 | 0.9% |
| w./c | 4 | 0.7% |
| Other values (403) | 482 |
| Value | Count | Frequency (%) |
| pc | 32 | 5.7% |
| c.a | 11 | 1.9% |
| ca | 10 | 1.8% |
| a/5 | 9 | 1.6% |
| soton/o.q | 5 | 0.9% |
| w./c | 5 | 0.9% |
| 2343 | 5 | 0.9% |
| ston/o | 4 | 0.7% |
| 2 | 4 | 0.7% |
| a/4 | 4 | 0.7% |
| Other values (408) | 476 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 384 | |
| 1 | 348 | |
| 2 | 304 | |
| 7 | 235 | 7.5% |
| 6 | 211 | 6.7% |
| 0 | 210 | 6.7% |
| 4 | 210 | 6.7% |
| 5 | 203 | 6.5% |
| 9 | 159 | 5.1% |
| 8 | 145 | 4.6% |
| Other values (22) | 737 |
| Value | Count | Frequency (%) |
| 3 | 359 | |
| 1 | 354 | |
| 2 | 271 | |
| 7 | 255 | |
| 4 | 245 | |
| 0 | 215 | 7.2% |
| 6 | 202 | 6.8% |
| 5 | 193 | 6.5% |
| 9 | 158 | 5.3% |
| 8 | 142 | 4.8% |
| Other values (17) | 585 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 3146 |
| Value | Count | Frequency (%) |
| (unknown) | 2979 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 3 | 384 | |
| 1 | 348 | |
| 2 | 304 | |
| 7 | 235 | 7.5% |
| 6 | 211 | 6.7% |
| 0 | 210 | 6.7% |
| 4 | 210 | 6.7% |
| 5 | 203 | 6.5% |
| 9 | 159 | 5.1% |
| 8 | 145 | 4.6% |
| Other values (22) | 737 |
| Value | Count | Frequency (%) |
| 3 | 359 | |
| 1 | 354 | |
| 2 | 271 | |
| 7 | 255 | |
| 4 | 245 | |
| 0 | 215 | 7.2% |
| 6 | 202 | 6.8% |
| 5 | 193 | 6.5% |
| 9 | 158 | 5.3% |
| 8 | 142 | 4.8% |
| Other values (17) | 585 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 3146 |
| Value | Count | Frequency (%) |
| (unknown) | 2979 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 3 | 384 | |
| 1 | 348 | |
| 2 | 304 | |
| 7 | 235 | 7.5% |
| 6 | 211 | 6.7% |
| 0 | 210 | 6.7% |
| 4 | 210 | 6.7% |
| 5 | 203 | 6.5% |
| 9 | 159 | 5.1% |
| 8 | 145 | 4.6% |
| Other values (22) | 737 |
| Value | Count | Frequency (%) |
| 3 | 359 | |
| 1 | 354 | |
| 2 | 271 | |
| 7 | 255 | |
| 4 | 245 | |
| 0 | 215 | 7.2% |
| 6 | 202 | 6.8% |
| 5 | 193 | 6.5% |
| 9 | 158 | 5.3% |
| 8 | 142 | 4.8% |
| Other values (17) | 585 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 3146 |
| Value | Count | Frequency (%) |
| (unknown) | 2979 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 3 | 384 | |
| 1 | 348 | |
| 2 | 304 | |
| 7 | 235 | 7.5% |
| 6 | 211 | 6.7% |
| 0 | 210 | 6.7% |
| 4 | 210 | 6.7% |
| 5 | 203 | 6.5% |
| 9 | 159 | 5.1% |
| 8 | 145 | 4.6% |
| Other values (22) | 737 |
| Value | Count | Frequency (%) |
| 3 | 359 | |
| 1 | 354 | |
| 2 | 271 | |
| 7 | 255 | |
| 4 | 245 | |
| 0 | 215 | 7.2% |
| 6 | 202 | 6.8% |
| 5 | 193 | 6.5% |
| 9 | 158 | 5.3% |
| 8 | 142 | 4.8% |
| Other values (17) | 585 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 170 | 173 |
| Distinct (%) | 38.1% | 38.8% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.869544 | 34.032454 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 11 | 9 |
| Zeros (%) | 2.5% | 2.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.05 | 7.162525 |
| Q1 | 7.925 | 7.8958 |
| median | 14.4542 | 14.4542 |
| Q3 | 29.7 | 31.275 |
| 95-th percentile | 108.28125 | 130.2375 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 21.775 | 23.3792 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 44.323541 | 54.25907 |
| Coefficient of variation (CV) | 1.4839042 | 1.5943331 |
| Kurtosis | 37.58623 | 30.286601 |
| Mean | 29.869544 | 34.032454 |
| Median Absolute Deviation (MAD) | 7.2042 | 7.2042 |
| Skewness | 4.9118206 | 4.6439131 |
| Sum | 13321.817 | 15178.475 |
| Variance | 1964.5763 | 2944.0467 |
| Monotonicity | Not monotonic | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 8.05 | 25 | 5.6% |
| 13 | 23 | 5.2% |
| 26 | 18 | 4.0% |
| 7.75 | 16 | 3.6% |
| 10.5 | 12 | 2.7% |
| 0 | 11 | 2.5% |
| 7.925 | 10 | 2.2% |
| 7.8958 | 9 | 2.0% |
| 7.8542 | 8 | 1.8% |
| 8.6625 | 8 | 1.8% |
| Other values (160) | 306 |
| Value | Count | Frequency (%) |
| 13 | 23 | 5.2% |
| 8.05 | 21 | 4.7% |
| 7.75 | 18 | 4.0% |
| 26 | 18 | 4.0% |
| 7.8958 | 17 | 3.8% |
| 10.5 | 11 | 2.5% |
| 26.55 | 10 | 2.2% |
| 8.6625 | 10 | 2.2% |
| 0 | 9 | 2.0% |
| 7.775 | 9 | 2.0% |
| Other values (163) | 300 |
| Value | Count | Frequency (%) |
| 0 | 11 | |
| 6.4958 | 2 | 0.4% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 6 | |
| 7.0542 | 2 | 0.4% |
| 7.125 | 4 | 0.9% |
| 7.1417 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 2 | 0.4% |
| 6.975 | 1 | 0.2% |
| 7.05 | 5 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| 7.1417 | 1 | 0.2% |
| 7.225 | 8 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 2 | 0.4% |
| 6.975 | 1 | 0.2% |
| 7.05 | 5 | |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| 7.1417 | 1 | 0.2% |
| 7.225 | 8 |
| Value | Count | Frequency (%) |
| 0 | 11 | |
| 6.4958 | 2 | 0.4% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 6.975 | 2 | 0.4% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 6 | |
| 7.0542 | 2 | 0.4% |
| 7.125 | 4 | 0.9% |
| 7.1417 | 1 | 0.2% |
Cabin
['Text', 'Text']
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 88 | 95 |
| Distinct (%) | 81.5% | 88.0% |
| Missing | 338 | 338 |
| Missing (%) | 75.8% | 75.8% |
| Memory size | 7.0 KiB | 7.0 KiB |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 11 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.5092593 | 3.6111111 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 69 | 85 ? |
| Unique (%) | 63.9% | 78.7% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | F G73 | E49 |
| 2nd row | C123 | C103 |
| 3rd row | F G73 | B94 |
| 4th row | E63 | C123 |
| 5th row | D6 | F G73 |
| Value | Count | Frequency (%) |
| b96 | 3 | 2.4% |
| b98 | 3 | 2.4% |
| f | 3 | 2.4% |
| g73 | 2 | 1.6% |
| c22 | 2 | 1.6% |
| c26 | 2 | 1.6% |
| c123 | 2 | 1.6% |
| b58 | 2 | 1.6% |
| b60 | 2 | 1.6% |
| e24 | 2 | 1.6% |
| Other values (85) | 100 |
| Value | Count | Frequency (%) |
| g6 | 3 | 2.4% |
| c23 | 3 | 2.4% |
| c25 | 3 | 2.4% |
| c27 | 3 | 2.4% |
| b96 | 3 | 2.4% |
| b98 | 3 | 2.4% |
| e67 | 2 | 1.6% |
| f2 | 2 | 1.6% |
| e8 | 2 | 1.6% |
| c123 | 2 | 1.6% |
| Other values (94) | 101 |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 40 | 10.6% |
| C | 37 | 9.8% |
| 1 | 37 | 9.8% |
| B | 31 | 8.2% |
| 6 | 27 | 7.1% |
| 3 | 27 | 7.1% |
| E | 21 | 5.5% |
| 8 | 21 | 5.5% |
| 9 | 19 | 5.0% |
| 7 | 18 | 4.7% |
| Other values (9) | 101 |
| Value | Count | Frequency (%) |
| C | 41 | 10.5% |
| 2 | 38 | 9.7% |
| B | 35 | 9.0% |
| 3 | 32 | 8.2% |
| 6 | 30 | 7.7% |
| 1 | 29 | 7.4% |
| 9 | 22 | 5.6% |
| 8 | 21 | 5.4% |
| 5 | 20 | 5.1% |
| 7 | 19 | 4.9% |
| Other values (9) | 103 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 379 |
| Value | Count | Frequency (%) |
| (unknown) | 390 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 2 | 40 | 10.6% |
| C | 37 | 9.8% |
| 1 | 37 | 9.8% |
| B | 31 | 8.2% |
| 6 | 27 | 7.1% |
| 3 | 27 | 7.1% |
| E | 21 | 5.5% |
| 8 | 21 | 5.5% |
| 9 | 19 | 5.0% |
| 7 | 18 | 4.7% |
| Other values (9) | 101 |
| Value | Count | Frequency (%) |
| C | 41 | 10.5% |
| 2 | 38 | 9.7% |
| B | 35 | 9.0% |
| 3 | 32 | 8.2% |
| 6 | 30 | 7.7% |
| 1 | 29 | 7.4% |
| 9 | 22 | 5.6% |
| 8 | 21 | 5.4% |
| 5 | 20 | 5.1% |
| 7 | 19 | 4.9% |
| Other values (9) | 103 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 379 |
| Value | Count | Frequency (%) |
| (unknown) | 390 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 2 | 40 | 10.6% |
| C | 37 | 9.8% |
| 1 | 37 | 9.8% |
| B | 31 | 8.2% |
| 6 | 27 | 7.1% |
| 3 | 27 | 7.1% |
| E | 21 | 5.5% |
| 8 | 21 | 5.5% |
| 9 | 19 | 5.0% |
| 7 | 18 | 4.7% |
| Other values (9) | 101 |
| Value | Count | Frequency (%) |
| C | 41 | 10.5% |
| 2 | 38 | 9.7% |
| B | 35 | 9.0% |
| 3 | 32 | 8.2% |
| 6 | 30 | 7.7% |
| 1 | 29 | 7.4% |
| 9 | 22 | 5.6% |
| 8 | 21 | 5.4% |
| 5 | 20 | 5.1% |
| 7 | 19 | 4.9% |
| Other values (9) | 103 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 379 |
| Value | Count | Frequency (%) |
| (unknown) | 390 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 2 | 40 | 10.6% |
| C | 37 | 9.8% |
| 1 | 37 | 9.8% |
| B | 31 | 8.2% |
| 6 | 27 | 7.1% |
| 3 | 27 | 7.1% |
| E | 21 | 5.5% |
| 8 | 21 | 5.5% |
| 9 | 19 | 5.0% |
| 7 | 18 | 4.7% |
| Other values (9) | 101 |
| Value | Count | Frequency (%) |
| C | 41 | 10.5% |
| 2 | 38 | 9.7% |
| B | 35 | 9.0% |
| 3 | 32 | 8.2% |
| 6 | 30 | 7.7% |
| 1 | 29 | 7.4% |
| 9 | 22 | 5.6% |
| 8 | 21 | 5.4% |
| 5 | 20 | 5.1% |
| 7 | 19 | 4.9% |
| Other values (9) | 103 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 1 | 1 |
| Missing (%) | 0.2% | 0.2% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q | 30 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Q | C |
| 2nd row | S | C |
| 3rd row | S | C |
| 4th row | C | S |
| 5th row | S | C |
Common Values
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 73 | 16.4% |
| Q | 39 | 8.7% |
| (Missing) | 1 | 0.2% |
| Value | Count | Frequency (%) |
| S | 332 | |
| C | 83 | 18.6% |
| Q | 30 | 6.7% |
| (Missing) | 1 | 0.2% |
Length
Histogram of lengths of the category
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 333 | |
| c | 73 | 16.4% |
| q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| s | 332 | |
| c | 83 | 18.7% |
| q | 30 | 6.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 73 | 16.4% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 332 | |
| C | 83 | 18.7% |
| Q | 30 | 6.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 73 | 16.4% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 332 | |
| C | 83 | 18.7% |
| Q | 30 | 6.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 73 | 16.4% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 332 | |
| C | 83 | 18.7% |
| Q | 30 | 6.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 445 |
| Value | Count | Frequency (%) |
| (unknown) | 445 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| S | 333 | |
| C | 73 | 16.4% |
| Q | 39 | 8.8% |
| Value | Count | Frequency (%) |
| S | 332 | |
| C | 83 | 18.7% |
| Q | 30 | 6.7% |
Interactions
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Correlations
Dataset A
Dataset B
Dataset A
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.109 | 0.160 | -0.232 | -0.001 | 0.269 | 0.149 | -0.192 | 0.239 |
| Embarked | 0.109 | 1.000 | 0.197 | 0.000 | 0.000 | 0.244 | 0.123 | 0.138 | 0.177 |
| Fare | 0.160 | 0.197 | 1.000 | 0.406 | -0.053 | 0.449 | 0.146 | 0.410 | 0.167 |
| Parch | -0.232 | 0.000 | 0.406 | 1.000 | 0.021 | 0.000 | 0.300 | 0.439 | 0.151 |
| PassengerId | -0.001 | 0.000 | -0.053 | 0.021 | 1.000 | 0.000 | 0.050 | -0.065 | 0.071 |
| Pclass | 0.269 | 0.244 | 0.449 | 0.000 | 0.000 | 1.000 | 0.036 | 0.113 | 0.317 |
| Sex | 0.149 | 0.123 | 0.146 | 0.300 | 0.050 | 0.036 | 1.000 | 0.226 | 0.492 |
| SibSp | -0.192 | 0.138 | 0.410 | 0.439 | -0.065 | 0.113 | 0.226 | 1.000 | 0.138 |
| Survived | 0.239 | 0.177 | 0.167 | 0.151 | 0.071 | 0.317 | 0.492 | 0.138 | 1.000 |
Dataset B
| Age | Embarked | Fare | Parch | PassengerId | Pclass | Sex | SibSp | Survived | |
|---|---|---|---|---|---|---|---|---|---|
| Age | 1.000 | 0.000 | 0.183 | -0.258 | -0.054 | 0.307 | 0.161 | -0.123 | 0.215 |
| Embarked | 0.000 | 1.000 | 0.209 | 0.000 | 0.000 | 0.257 | 0.083 | 0.000 | 0.152 |
| Fare | 0.183 | 0.209 | 1.000 | 0.424 | -0.053 | 0.480 | 0.187 | 0.466 | 0.254 |
| Parch | -0.258 | 0.000 | 0.424 | 1.000 | 0.035 | 0.000 | 0.287 | 0.472 | 0.182 |
| PassengerId | -0.054 | 0.000 | -0.053 | 0.035 | 1.000 | 0.000 | 0.076 | -0.000 | 0.065 |
| Pclass | 0.307 | 0.257 | 0.480 | 0.000 | 0.000 | 1.000 | 0.114 | 0.163 | 0.303 |
| Sex | 0.161 | 0.083 | 0.187 | 0.287 | 0.076 | 0.114 | 1.000 | 0.164 | 0.538 |
| SibSp | -0.123 | 0.000 | 0.466 | 0.472 | -0.000 | 0.163 | 0.164 | 1.000 | 0.203 |
| Survived | 0.215 | 0.152 | 0.254 | 0.182 | 0.065 | 0.303 | 0.538 | 0.203 | 1.000 |
Missing values
Dataset A
A simple visualization of nullity by column.
Dataset B
A simple visualization of nullity by column.
Dataset A
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset B
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Dataset A
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Dataset B
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
Sample
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 560 | 561 | 0 | 3 | Morrow, Mr. Thomas Rowan | male | NaN | 0 | 0 | 372622 | 7.7500 | NaN | Q |
| 769 | 770 | 0 | 3 | Gronnestad, Mr. Daniel Danielsen | male | 32.0 | 0 | 0 | 8471 | 8.3625 | NaN | S |
| 372 | 373 | 0 | 3 | Beavan, Mr. William Thomas | male | 19.0 | 0 | 0 | 323951 | 8.0500 | NaN | S |
| 306 | 307 | 1 | 1 | Fleming, Miss. Margaret | female | NaN | 0 | 0 | 17421 | 110.8833 | NaN | C |
| 75 | 76 | 0 | 3 | Moen, Mr. Sigurd Hansen | male | 25.0 | 0 | 0 | 348123 | 7.6500 | F G73 | S |
| 3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
| 790 | 791 | 0 | 3 | Keane, Mr. Andrew "Andy" | male | NaN | 0 | 0 | 12460 | 7.7500 | NaN | Q |
| 210 | 211 | 0 | 3 | Ali, Mr. Ahmed | male | 24.0 | 0 | 0 | SOTON/O.Q. 3101311 | 7.0500 | NaN | S |
| 844 | 845 | 0 | 3 | Culumovic, Mr. Jeso | male | 17.0 | 0 | 0 | 315090 | 8.6625 | NaN | S |
| 348 | 349 | 1 | 3 | Coutts, Master. William Loch "William" | male | 3.0 | 1 | 1 | C.A. 37671 | 15.9000 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36 | 37 | 1 | 3 | Mamee, Mr. Hanna | male | NaN | 0 | 0 | 2677 | 7.2292 | NaN | C |
| 766 | 767 | 0 | 1 | Brewe, Dr. Arthur Jackson | male | NaN | 0 | 0 | 112379 | 39.6000 | NaN | C |
| 524 | 525 | 0 | 3 | Kassem, Mr. Fared | male | NaN | 0 | 0 | 2700 | 7.2292 | NaN | C |
| 683 | 684 | 0 | 3 | Goodwin, Mr. Charles Edward | male | 14.00 | 5 | 2 | CA 2144 | 46.9000 | NaN | S |
| 874 | 875 | 1 | 2 | Abelson, Mrs. Samuel (Hannah Wizosky) | female | 28.00 | 1 | 0 | P/PP 3381 | 24.0000 | NaN | C |
| 722 | 723 | 0 | 2 | Gillespie, Mr. William Henry | male | 34.00 | 0 | 0 | 12233 | 13.0000 | NaN | S |
| 469 | 470 | 1 | 3 | Baclini, Miss. Helene Barbara | female | 0.75 | 2 | 1 | 2666 | 19.2583 | NaN | C |
| 188 | 189 | 0 | 3 | Bourke, Mr. John | male | 40.00 | 1 | 1 | 364849 | 15.5000 | NaN | Q |
| 443 | 444 | 1 | 2 | Reynaldo, Ms. Encarnacion | female | 28.00 | 0 | 0 | 230434 | 13.0000 | NaN | S |
| 535 | 536 | 1 | 2 | Hart, Miss. Eva Miriam | female | 7.00 | 0 | 2 | F.C.C. 13529 | 26.2500 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 277 | 278 | 0 | 2 | Parkes, Mr. Francis "Frank" | male | NaN | 0 | 0 | 239853 | 0.0000 | NaN | S |
| 478 | 479 | 0 | 3 | Karlsson, Mr. Nils August | male | 22.0 | 0 | 0 | 350060 | 7.5208 | NaN | S |
| 784 | 785 | 0 | 3 | Ali, Mr. William | male | 25.0 | 0 | 0 | SOTON/O.Q. 3101312 | 7.0500 | NaN | S |
| 465 | 466 | 0 | 3 | Goncalves, Mr. Manuel Estanslas | male | 38.0 | 0 | 0 | SOTON/O.Q. 3101306 | 7.0500 | NaN | S |
| 806 | 807 | 0 | 1 | Andrews, Mr. Thomas Jr | male | 39.0 | 0 | 0 | 112050 | 0.0000 | A36 | S |
| 802 | 803 | 1 | 1 | Carter, Master. William Thornton II | male | 11.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S |
| 880 | 881 | 1 | 2 | Shelley, Mrs. William (Imanita Parrish Hall) | female | 25.0 | 0 | 1 | 230433 | 26.0000 | NaN | S |
| 702 | 703 | 0 | 3 | Barbara, Miss. Saiide | female | 18.0 | 0 | 1 | 2691 | 14.4542 | NaN | C |
| 169 | 170 | 0 | 3 | Ling, Mr. Lee | male | 28.0 | 0 | 0 | 1601 | 56.4958 | NaN | S |
| 400 | 401 | 1 | 3 | Niskanen, Mr. Juha | male | 39.0 | 0 | 0 | STON/O 2. 3101289 | 7.9250 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 855 | 856 | 1 | 3 | Aks, Mrs. Sam (Leah Rosen) | female | 18.0 | 0 | 1 | 392091 | 9.3500 | NaN | S |
| 565 | 566 | 0 | 3 | Davies, Mr. Alfred J | male | 24.0 | 2 | 0 | A/4 48871 | 24.1500 | NaN | S |
| 276 | 277 | 0 | 3 | Lindblom, Miss. Augusta Charlotta | female | 45.0 | 0 | 0 | 347073 | 7.7500 | NaN | S |
| 131 | 132 | 0 | 3 | Coelho, Mr. Domingos Fernandeo | male | 20.0 | 0 | 0 | SOTON/O.Q. 3101307 | 7.0500 | NaN | S |
| 780 | 781 | 1 | 3 | Ayoub, Miss. Banoura | female | 13.0 | 0 | 0 | 2687 | 7.2292 | NaN | C |
| 376 | 377 | 1 | 3 | Landergren, Miss. Aurora Adelia | female | 22.0 | 0 | 0 | C 7077 | 7.2500 | NaN | S |
| 666 | 667 | 0 | 2 | Butler, Mr. Reginald Fenton | male | 25.0 | 0 | 0 | 234686 | 13.0000 | NaN | S |
| 355 | 356 | 0 | 3 | Vanden Steen, Mr. Leo Peter | male | 28.0 | 0 | 0 | 345783 | 9.5000 | NaN | S |
| 394 | 395 | 1 | 3 | Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson) | female | 24.0 | 0 | 2 | PP 9549 | 16.7000 | G6 | S |
| 55 | 56 | 1 | 1 | Woolner, Mr. Hugh | male | NaN | 0 | 0 | 19947 | 35.5000 | C52 | S |
Duplicate rows
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||